Diffusion model

In machine learning, diffusion models, also known as diffusion probabilistic models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of three major components: the forward process, the reverse process, and the sampling procedure.^[1] The goal of diffusion models is to learn a diffusion process that generates a probability distribution for a given dataset from which we can then sample new images. They learn the latent structure of a dataset by modeling the way in which data points diffuse through their latent space.^[2]

In the case of computer vision, diffusion models can be applied to a variety of tasks, including image denoising, inpainting, super-resolution, and image generation. This typically involves training a neural network to sequentially denoise images blurred with Gaussian noise.^[2]^[3] The model is trained to reverse the process of adding noise to an image. After training to convergence, it can be used for image generation by starting with an image composed of random noise for the network to iteratively denoise. Announced on 13 April 2022, OpenAI's text-to-image model DALL-E 2 is an example that uses diffusion models for both the model's prior (which produces an image embedding given a text caption) and the decoder that generates the final image.^[4] Diffusion models have recently found applications in natural language processing (NLP),^[5] particularly in areas like text generation^[6]^[7] and summarization.^[8]

Diffusion models are typically formulated as markov chains and trained using variational inference.^[9] Examples of generic diffusion modeling frameworks used in computer vision are denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations.^[10]

^ Chang, Ziyi; Koulieris, George Alex; Shum, Hubert P. H. (2023). "On the Design Fundamentals of Diffusion Models: A Survey". arXiv:2306.04542 [cs.LG].
^ ^a ^b Song, Yang; Sohl-Dickstein, Jascha; Kingma, Diederik P.; Kumar, Abhishek; Ermon, Stefano; Poole, Ben (2021-02-10). "Score-Based Generative Modeling through Stochastic Differential Equations". arXiv:2011.13456 [cs.LG].
^ Gu, Shuyang; Chen, Dong; Bao, Jianmin; Wen, Fang; Zhang, Bo; Chen, Dongdong; Yuan, Lu; Guo, Baining (2021). "Vector Quantized Diffusion Model for Text-to-Image Synthesis". arXiv:2111.14822 [cs.CV].
^ Cite error: The named reference dalle2 was invoked but never defined (see the help page).
^ Li, Yifan; Zhou, Kun; Zhao, Wayne Xin; Wen, Ji-Rong (August 2023). "Diffusion Models for Non-autoregressive Text Generation: A Survey". Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization. pp. 6692–6701. arXiv:2303.06574. doi:10.24963/ijcai.2023/750. ISBN 978-1-956792-03-4.
^ Han, Xiaochuang; Kumar, Sachin; Tsvetkov, Yulia (2023). "SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control". Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 11575–11596. arXiv:2210.17432. doi:10.18653/v1/2023.acl-long.647.
^ Xu, Weijie; Hu, Wenxiang; Wu, Fanyou; Sengamedu, Srinivasan (2023). "DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM". Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 9040–9057. arXiv:2310.15296. doi:10.18653/v1/2023.findings-emnlp.606.
^ Zhang, Haopeng; Liu, Xiao; Zhang, Jiawei (2023). "DiffuSum: Generation Enhanced Extractive Summarization with Diffusion". Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 13089–13100. arXiv:2305.01735. doi:10.18653/v1/2023.findings-acl.828.
^ Cite error: The named reference ho was invoked but never defined (see the help page).
^ Croitoru, Florinel-Alin; Hondru, Vlad; Ionescu, Radu Tudor; Shah, Mubarak (2023). "Diffusion Models in Vision: A Survey". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (9): 10850–10869. arXiv:2209.04747. doi:10.1109/TPAMI.2023.3261988. PMID 37030794. S2CID 252199918.

[chang23design-1] Chang, Ziyi; Koulieris, George Alex; Shum, Hubert P. H. (2023). "On the Design Fundamentals of Diffusion Models: A Survey". arXiv:2306.04542 [cs.LG].

[song-2] Song, Yang; Sohl-Dickstein, Jascha; Kingma, Diederik P.; Kumar, Abhishek; Ermon, Stefano; Poole, Ben (2021-02-10). "Score-Based Generative Modeling through Stochastic Differential Equations". arXiv:2011.13456 [cs.LG].

[gu-3] Gu, Shuyang; Chen, Dong; Bao, Jianmin; Wen, Fang; Zhang, Bo; Chen, Dongdong; Yuan, Lu; Guo, Baining (2021). "Vector Quantized Diffusion Model for Text-to-Image Synthesis". arXiv:2111.14822 [cs.CV].

[dalle2-4] Cite error: The named reference dalle2 was invoked but never defined (see the help page).

[5] Li, Yifan; Zhou, Kun; Zhao, Wayne Xin; Wen, Ji-Rong (August 2023). "Diffusion Models for Non-autoregressive Text Generation: A Survey". Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization. pp. 6692–6701. arXiv:2303.06574. doi:10.24963/ijcai.2023/750. ISBN 978-1-956792-03-4.

[6] Han, Xiaochuang; Kumar, Sachin; Tsvetkov, Yulia (2023). "SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control". Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 11575–11596. arXiv:2210.17432. doi:10.18653/v1/2023.acl-long.647.

[7] Xu, Weijie; Hu, Wenxiang; Wu, Fanyou; Sengamedu, Srinivasan (2023). "DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM". Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 9040–9057. arXiv:2310.15296. doi:10.18653/v1/2023.findings-emnlp.606.

[8] Zhang, Haopeng; Liu, Xiao; Zhang, Jiawei (2023). "DiffuSum: Generation Enhanced Extractive Summarization with Diffusion". Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, PA, USA: Association for Computational Linguistics: 13089–13100. arXiv:2305.01735. doi:10.18653/v1/2023.findings-acl.828.

[ho-9] Cite error: The named reference ho was invoked but never defined (see the help page).

[10] Croitoru, Florinel-Alin; Hondru, Vlad; Ionescu, Radu Tudor; Shah, Mubarak (2023). "Diffusion Models in Vision: A Survey". IEEE Transactions on Pattern Analysis and Machine Intelligence. 45 (9): 10850–10869. arXiv:2209.04747. doi:10.1109/TPAMI.2023.3261988. PMID 37030794. S2CID 252199918.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]